Training course: Plotting Data for Communication and Exploration

Dianne Cook
Monash University
Produced for e61, September 23, 2024

Session 2: Creating communication graphics (mostly)


time topic
30 Determining which plot is the most effective
15 Representing uncertainty
15 Managing multivariate data
20 Mapping spatial data

Which plot is the most effective?

Creating null samples

What would be NOT interesting?

some possible patterns
# scatterplot with points spread everywhere
# histogram with bell-shape
# side-by-side boxplots with same median/box
# no difference between colour groups

Assess the plot design by embedding it among a field of plots made using the same design on null data.

Measure the frequency at which readers identify the data plot.

Connection to statistics

  • Using a mapping to specify a plot creates a statistic.
  • With a statistic we can explore its distribution.
  • What might the values look like under different scenarios.

\[\bar{X} = \frac{1}{n} \sum_{i=1}^n X_i\]

Statistical power is the probability that if the observed statistic is different from a null, that the test will detect it.

Show the lineup of plots to a set of observers, and record the number of detects. The plot design with the more detects has the higher power, higher signal strength.

Let’s try one

We need to break you into two groups.

  • Group 1: birthday is between Jan 1 and Jun 30
  • Group 2: Everyone else

When your group is labelled close your eyes. No peeking!

Testing



Which plot is the most different?

00:20





Compute signal strength:

Incorporating uncertainty into plots

Sampling

There are many types of uncertainty. One is, suppose we had a different sample.

Managing multivariate data

Scatterplot matrix

Plot

  • all the pairs of variables.
  • univariate distributions.
  • maybe show correlations, too.

Interactivity allows examining relationships between more than two variables.

Selecting points using a square “brush”, allows you to see where observations lie in the other plots (pairs of variables).

Mapping spatial data

What is a map? (1/2)

What is a map? (2/2)

A map is a collection of points, defining polygons.

Special care:

  • Maps for data analysis do NOT need the full cartographic detail. A first step is usually to THIN the map, reduce the resolution, use a smaller object do it plots quickly!
  • Aspect ratio so that it looks like the familiar shape.
  • Special spatial projections are often used.

Constructing a choropleth

Numerical value of statistic is attached to the respective polygon.



But a problem, especially for Australia is that small geographic, but high population density, areas get lost.

Making the small regions visible (1/2)

A cartogram expands a geographic are relative to the population in the area.



See more on cartograms here.

A better solution for Australia is needed, though.

Making the small regions visible (2/2)


Learn more about hexagon tiling that works better for Australia here.

Adding interaction

Resources

End of session 2

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.